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We study the dynamics of some uniform learning strategy limits or a probabilistic version of the 
"Kolkata Paise Restaurant" problem, where A'^ agents choose among A'^ equally priced but differently 
ranked restaurants every evening such that each agent can get dinner in the best possible ranked 
restaurant (each serving only one customer and the rest arriving there going without dinner that 
evening). We consider the learning to be uniform among the agents and assume that each follow the 
same probabilistic strategy dependent on the information of the past successes in the game. The 
numerical results for utilization of the restaurants in some limiting cases are analytically examined. 

PACS numbers: 



I. INTRODUCTION 

The Kolkata Paise Restaurant (KPR) problem (see [T]) is a repeated game, played between a large number of agents 
having no interaction among themselves. In KPR, A'^ prospective customers choose from N restaurants each evening 
(time t) in a parallel decision mode. Each restaurant have identical price but different rank k (agreed by the all the 
N agents) and can serve only one customer. If more than one agents arrive at any restaurant on any evening, one 
of them is randomly chosen and is served and the rest do not get dinner that evening. Information regarding the 
agent distributions etc for earlier evenings are available to everyone. Each evening, each agent makes his/her decision 
independent of others. Each agent has an objective to arrive at the highest possible ranked restaurant, avoiding the 
crowd so that he or she gets dinner there. Because of fluctuations (in avoiding herding behavior), more than one 
agents may choose the same restaurant and all of them, except the one randomly chosen by the restaurant, then miss 
dinner that evening and they are likely to change their strategy for choosing the respective restaurants next evening. 
As can be easily seen, no arrangement of the agent distribution among the restaurants can satisfy everybody on any 
evening and the dynamics of optimal choice continues for ever. On a collective level, we look for the fraction (/) of 
customers getting dinner in any evening and also its distribution for various strategies of the game. 

It might be interesting to note here that for KPR, most of the strategies will give a low average (over evenings) 
value of resource utilization (average fraction / << 1), because of the absence of mutual interaction/discussion among 
the agents. However, a simple (dictated) strategy, instructing each agent go to a sequence of the ranked restaurants 
respectively on the first evening already and than shift by one rank step in the in the next evenings will automatically 
lead to the best optimized solution (with / = /=!). Also, each one gets in turn to the best ranked restaurant (with 
periodicity N). The process starts from the first evening itself. It is hard to find a strategy in KPR, where each 
agent decides independently (democratically) based on past experience and information, to achieve this even after 
long learning time. 

Let the strategy chosen by each agent in the KPR game be such that, at any time t, the probability Pkit) to arrive 
at the A;-th ranked restaurant is given by 
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where nk{t — 1) gives the number of agents arriving at the fc-th ranked restaurant on the previous evening (or time 
i — 1), T is a noise scaling factor and a is an exponent. Here for a > and T > 0, the probability for any agent to 
choose a particular restaurant increases with its rank k and decreases with the past popularity of the same restaurant 
(given by the number nk{t — 1) of agents arriving at that restaurant on the previous evening). For a = and T —^ oo, 
Pk{t) — l/N corresponds to random choice (independent of rank) case. For a — 0, T ^ 0, the agents avoid those 
restaurants visited last evening and choose again randomly among the rest. For a = I, and T oo, the game 
corresponds to a strictly rank-dependent choice case. We concentrate on these three special limits. 



II. NUMERICAL ANALYSIS 



A. Random-choice 



For the case where a — and T ~f oo, the probability Pkit) becomes independent of k and becomes qeuivalent to 1/iV. 
For simulation we take 1000 restaurant and 1000 agents and on each evening t an agent selects any restaurant with 
equal probability p = 1/N. All averages have been made for 10^ time steps. We study the variation of probability 
D(f) of the agents getting dinner versus their fraction /. The numerical analysis shows that mean and mode of the 
distribution occurs around / ~ 0.63 and that the distribution D{f) is a Gaussian around that (see Fig. flj. 
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FIG. 1: Numerical simulation results for the distribution D{f) of the fraction / of people getting dinner any evening (or 
fraction of restaurants occupied on any evening) against / for different limits of a and T. All the simulations have been done 
for = 1000 (number of restaurants and agents) and the statistics have been obtained after averages over 10® time steps 
(evenings) after stabilization. 



B. Strict-rank-dependent choice 



For a — 1, T oo, Pk{t) — k/z; z ~ ^k. In this case, each agent chooses a restaurant having rank k with a 
probability, strictly given by its rank k. Here also we take 1000 agents and 1000 restaurants and average over 10^ 
time steps for obtaining the statistics. Fig. [l] shows that D{f) is again a Gaussian and that its maximum occurs at 
/ - 0.58 = /. 



C. Avoiding-past-crowd choice 



In this case an agent chooses randomly among those restaurents which went vacant in the previous evening: with 
probability Pk{t) = exp(— "'°^y~^^ )/z, where z = exp(— "''^jT"^^ ) and T ^ 0, one gets Pk ^ for fc values for which 
nfc(i — 1) > and pk = l/N' for other values of k where N' is the number of vacant restaurants in time t — 1. For 



3 



1 




1 2 3 4 5 6 7 8 9 10 

noise parameter T 

FIG. 2: Numerical simulation results for the average resource utilization fraction (/) against the noise parameter T for different 
values of a (>0). 



numerical studies we again take N — 1000 and average the statistics over 10^ time steps. In the Fig. [T] the Gaussian 
distribution D{f) of restaurant utihzation fraction / is shown. The average utilization fraction / is seen to be around 
0.46. 

III. ANALYTICAL RESULTS 
A. Random-choice case 

Suppose there are AA'^ agents and N restaurants. An agents can select any restaurant with equal probability. Therefore, 
the probability that a single restaurant is chosen by m agents is given by a Poission distribution in the limit N oo: 



D{m) = ( - p- ^ 



— r exp(— A) as N ^ oo. (2) 
to! 



Therefore the fraction of restaurants not chosen by any agents is given by D(m = 0) = exp(— A) and that implies that 
average fraction of restaurants occupied on any evening is given by [1] 

/ = 1 - exp(-A) ~ 0.63 for A = 1, (3) 
in the KPR problem. The distribution of the fraction of utilization will be Gaussian around this average. 



B. Strict-rank-dependent choice 

In this case, an agent goes to the /c-th ranked restaurant with probability Pk{t) = k/ '^k; that is, Pk{t) given by ([T|) in 
the limit a = 1, T oo. Starting with N restaurants and N agents, we make N/2 pairs of restaurants and each pair 
has restaurants ranked k and N + 1 — k where 1 < fc < N/2. Therefore, an agent chooses any pair of restaurant with 
uniform probability p — 2/N or N agents chooses randomly from N/2 pairs of restaurents. Therefore the fraction of 
pairs selected by the agents (from Eq. ([2|) 



/o = 1 - exp(-A) ~ 0.86 for A = 2. 



(4) 
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Also, the expected number of restaurants occupied in a pair of restaurants with rank k and A'^ + 1 — fc by a pair of 
agents is 

fc2 (N+1- kf k{N+l - fc) 

Therefore, the fraction of restaurants occupied by pairs of agents 

= ^ E (6) 

i=l,...,Af/2 

Hence, the actual fraction of restaurants occupied by the agents is 

/ = /o./i~0.58. (7) 
Again, this compares well with the numerical observation of the most probable distribution position (see Figs. 01 and 

C. Avoiding-past-crowd choice 

We consider here the case where each agent chooses on any evening {t) randomly among the restaurants in which 
nobody had gone in the last evening {t — 1). This correspond to the case where a = and T — > in Eq. Q. Our 
numerical simulation results for the distribution D{J) of the fraction / of utilized restaurants is again Gaussian with 
a most probable peak at / ~ 0.46 (see Figs. [l]and[2]). This can be explained in the following way: As the fraction / 
of restaurants visited by the agents in the last evening is avoided by the agents this evening, the number of available 
restaurants is N{1 — /) for this evening and is chosen randomly by all the N agents. Hence, when fitted to Eq. 
A = 1/(1 — /). Therefore, following Eq. (^, we can write the equation for / as 



(l-/)(l-exp(-^))=/. (8) 



Solution of this equation gives / ~ 0.46. This result agrees well with the numerical results for this limit (see Figs. [T] 
and|2] a = 0, T->0). 



IV. SUMMARY AND DISCUSSION 



We consider here a game where N agents (prospective customers) attempt to choose every evening (t) from N equally 
priced (hence no budget consideration for any individual agent is important) restaurants (each capable of serving 
only one) having well-defined ranking fc (= 1, A), agreed by all the agents. The decissions on every evening (t) 
are made by each agent independently, based on the informations about the rank k of the restaurants and their past 
popularity given by nfe(t — 1), .., nfc(O) in general. We consider here cases where each agent chooses the fc-th ranked 
restaurant with probability Pk{i) given by Eq. ([l]). The utilization fraction / of those restaurants on every evening 
is studied and their distributions D{f) are shown in Fig. [T|for some special cases. From numerical studies, we find 
their distributions to be Gaussian with the most probable utilization fraction / ~ 0.63, 0.58 and 0.46 for the cases 
with a = 0, r^oo, Q! = l,r^cx3 and a = 0, T ^ respectively. The analytical estimates for / in these limits are 
also given and they agree very well with the numerical observations. 

The KPR problem (see also the Kolkata Restaurant Problem j2j) has, in principle, a 'trivial' solution (dictated 
from outside) where each agent gets into one of the respective restaurant (full utilization with / = 1) starting on 
the first evening and gets the best possible sharing of their ranks as well when each one shifts to the next ranked 
restaurant (with the periodic boundary) in the successive evenings. However, this can be extremely difficult to achieve 
in the KPR game, even after long trial time, when each agent decides parallelly (or democratically) on their own, 
based on past experience and information regarding the history of the entire system of agents and restaurants. The 
problem becomes truly difficult in the N ^ oo limit. The KPR problem has similarity with the Minority Game 
Problem [3, 4^ as in both the games, herding behavior is punished and diversity's encouraged. Also, both involves 
learning of the agents from the past successes etc. Of course, KPR has some simple exact solution limits, a few of 
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which are discussed here. In none of these cases, considered here, learning strategies are individualistic; rather all 
the agents choose following the probability given by Eq. ([ij . In a few different limits of such a learning strategy, the 
average utilization fraction / and their distributions are obtained and compared with the analytic estimates, which 
are reasonably close. Needless to mention, the real challenge is to design algorithms of learning mixed strategies (e.g., 
from the pool discussed here) by the agents so that the simple 'dictated' solution emerges eventually even when every 
one decides on the basis of their own information independently. 

Acknowledgment: We are grateful to Arnab Chatterjee and Manipuspak Mitra for their important comments and 
suggestions. 



[1] A.S. Chakrabarti, B.K. Chakrabarti, A. Chatterjee, M. Mitra, The Kolkata Paise Restaurant problem and resource utilization, 

Physica A 388 (2009) 2420-2426. 
[2] B. K. Chakrabarti, Kolkata Restaurant Problem as a generalised El Parol Bar Problem, in Econophysics of Markets and 

Business Networks, Eds. A. Chatterjee and B. K. Chakrabarti, New Economic Windows Series, Springer, Milan (2007), pp. 

239-246. 

[3] D. Challet, M. Marsili, Y.-C. Zhang, Minority Games: Interacting Agents m Financial Markets, Oxford University Press, 
Oxford (2005). 

[4] D. Challet, Model of financial market information ecology, in Econophysics of Stock and Orther Markets, Eds. A. Chatterjee 
and B. K. Chakrabarti, springer, Milan (2006) pp. 101-112. 



